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ABSTRACT 



A system for transferring data in a single clock cycle 
between a digital signal processor (DSP) and an external 
memory unit and method of same. The system includes a 
data transfer element coupled between the external memory 
unit and the DSP, where the data transfer element is adapted 
to transfer the data between the external memory unit and 
the DSP in a single clock cycle. In one embodiment, the data 
transfer element is a coprocessor including a plurality of 
latch devices coupled to buses between the DSP and the 
memory unit. A first set of data are transferred from a first 
memory unit (e.g., from either the DSP internal memory unit 
or the external memory unit, depending on the direction of 
the data transfer) into the coprocessor during a first clock 
cycle and out of the coprocessor to a second memory unit in 
a second clock cycle occurring immediately after the first 
clock cycle. Data subsequent to the first set are similarly 
transferred during each clock cycle occurring immediately 
thereafter, so that data are transferred out of the first memory 
unit and into the second memory unit each clock cycle. 

10 Claims, 6 Drawing Sheets 
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SYSTEM FOR PRIMING A LATCH mance by the need to transfer data from off-core memory to 

BETWEEN TWO MEMORIES AND on-core memory and by the rate at which those data are 

TRANSFERRING DATA VIA THE LATCH IN transferred over the data bus. What is further needed is a 

SUCCESSIVE CLOCK CYCLE THEREAFTER system and/or method that addresses the above need and 

5 utilizes an efficient instruction set. The present invention 

TECHNICAL FIELD provides a novel solution to the above needs. 

The present invention pertains to the field of integrated These and other objects and advantages of the present 

circuits. More specifically, the present invention pertains to invention will become obvious to those of ordinary skill in 

a system and method for optimizing memory exchanges the art after having read the following detailed description of 

between a digital signal processor and an external memory. 10 the preferred embodiments which are illustrated in the 



BACKGROUND ART 



various drawing figures. 

DISCLOSURE OF THE INVENTION 



Digital integrated circuits (e.g., processors, specifically 

digital signal processors) used in computer systems are The present invention provides a system and method that 

increasingly powerful, and the rate at which they process 15 addresses the limitation on digital signal processor (DSP) 

data continues to get faster. To maximize the functionality system performance by reducing the number of clock cycles 

and performance of the computer system, it is imperative required to transfer data between internal and external 

that the supply of data to the processor keep up with, to the memory. The present invention also reduces the size of the 

extent possible, the rate at which the data are required by the instruction set, thereby reducing the size of the memory and 

application being executed by the processor. 20 thus also reducing the overall size of the DSP system. 

A digital signal processor (DSP) system of the prior art is The present invention pertains to a system for transferring 

the OAK™ DSP core licensed from DSP Semi Conductor by data in a single clock cycle between a digital signal proces- 

VLSI Technology, Inc. In the OAK digital signal processor sor (DSP) core and a memory unit, and method of same. The 

system, the DSP core includes a digital signal processor and ^ system includes the memory unit, a plurality of buses 

internal memory (that is, memory that is on-core). The coupled to the memory unit, and the DSP core coupled to the 

internal memory, by virtue of being located on the DSP core, plurality of buses. The system also includes a data transfer 

is directly accessible by the DSP and thus able to transfer element coupled between the memory unit and the DSP core, 

data very quickly to the DSP. Hence, data contained in the where the data transfer element is adapted to transfer the 

on-core memory are readily available to the DSP; therefore, 3Q data between the memory unit and the DSP core in a single 

by using the data from internal memory, the application can clock cycle. The present invention functions by pipelining 

be optimally run at the speed of the processor. However, the the data from the memory unit to the DSP core in a single 

internal memory is relatively small and limited in size by the clock cycle after the pipeline has been primed, 

on-core space that is available. In the OAK DSP core, for \ n one embodiment, the memory unit is external to the 

example, there is typically a total of 4K of on-core memory 35 DSP core. In this embodiment, the data transfer element is 

which is configured as two separate memories of 2K each. a coprocessor including a plurality of latch devices coupled 

This amount of memory is not sufficient to hold the large between the DSP core and the external memory unit via a 

quantities of data that are typically acquired and require plurality of data buses, respectively. The latch devices 

processing. provide intermediate registers in the coprocessor for storing 

In the prior art, the shortcoming with regard to on-core ^ the data being transferred between the DSP core and the 

memory is addressed by supplementing the internal memory external memory unit. Data are transferred into the copro- 

with external, or off-core, memory. The external memory is cessor during a first clock cycle and out of the coprocessor 

not limited by space considerations, and thus is capable of in a second clock cycle immediately following the first clock 

providing the space needed to store larger quantities of data. cycle. 

However, data stored in external memory need to be 45 i n the present embodiment, a first set of data are trans- 
retrieved from there and delivered to the DSP core in order ferred from one memory unit (e.g., from either the internal 
to be processed, and the processed data may need to be memory unit of the DSP core or from the external memory 
subsequently returned to external memory. Thus, the per- un jt f depending on whether the transaction is a write trans- 
formance of the DSP system is limited by the speed at which action or a read transaction) into the coprocessor during the 
data can be transferred over the data bus from the external 5Q first c \ock cyc i e anc j out of the coprocessor to the other 
memory to the DSP core, and likewise from the DSP core to memory unit (e.g., to either the external memory unit or the 
external memory. internal memory unit of the DSP core, again depending on 

In the prior art, each transfer of data from external whether the transaction is a write transaction or a read 

memory to internal memory, or from internal memory to transaction) in the second clock cycle occurring immediately 

external memory, takes at least two (2) clock cycles. Thus, 55 after the first clock cycle. Data subsequent to the first set are 

in general, it takes 2N clock cycles to transfer N units (e.g., likewise transferred from one memory unit to the coproces- 

blocks or tables) of data. It is desirable to reduce the number sor during each consecutive clock cycle occurring immedi- 

of clock cycles required to transfer a given amount of data, ately after the first clock cycle, and from the coprocessor to 

so that data are transferred more quickly and overall system the other memory unit during each consecutive clock cycle 

performance is improved. 60 occurring immediately after the second clock cycle. Thus, 

In addition, the prior art is problematic because the size of data are pipelined out of one memory unit and into the other 

the instruction sets (e.g.,. the code size) increases the size of each clock cycle after the pipeline is primed, 

the memory and thus also increases the overall size of the In the present embodiment, an address bus is coupled 

DSP system. Thus it is also desirable to reduce the size of the between the DSP core and the external memory unit, and an 

instruction set. 65 address modification and decode mechanism is coupled to 

Accordingly, what is needed is a method and/or system the address bus. In this embodiment, the address modifica- 

that addresses the limitation placed on DSP system perfor- tion and decode mechanism is an offset register, wherein an 
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offset value is specified and applied in order to map a first manipulated in a computer system. It has proven convenient 

address in one memory unit to a second address in the other at times, principally for reasons of common usage, to refer 

memory unit (e.g., an address in the internal memory of the to these signals as transactions, bits, values, elements, 

DSP core is mapped to an address in the external memory, symbols, characters, fragments, pixels, or the like, 

and vice versa). 5 As used herein, "transaction" or "transfer" refers to the 

transmission or receipt of data or other such message 

BRIEF DESCRIPTION OF THE DRAWINGS information. The transaction or transfer may consist of all 

data associated with a particular computer system operation 

The accompanying drawings, which are incorporated in ( e g ? a reques t or command). A transaction or transfer may 

and form a part of this specification, illustrate embodiments a is 0 consist of a block of data associated with a particular 

of the invention and, together with the description, serve to operation; for example, a transfer of data may be broken 

explain the principles of the invention: down into several blocks of data, each block transferred 

FIG. 1 is a block diagram of a general purpose computer prior to the transfer of a subsequent block, and each block 

system upon which embodiments of the present invention making up a transaction. 

may be implemented. 15 It should be borne in mind, however, that all of these and 

FIG. 2 is a block diagram of one embodiment of the data similar terms are to be associated with the appropriate 

transfer element (e.g., a coprocessor) used in accordance physical quantities and are merely convenient labels applied 

with the present invention. t0 these quantities. Unless specifically stated otherwise as 

FIG. 3 is a block diagram of an address modification and f W™* *™ ** filing discussions, it is appreciated 

decode mechanism in accordance with one embodiment of 20 * at thnmgtaout * e P resent invention discussions utihzing 

. terms such as "processing," "operating," "calculating," 

me present invention. r . J "determining," "displaying," or the like, refer to actions and 

FIG. 4 is a flowchart of a process for transferring data processes G f a computer system or similar electronic com- 

between a processor and a memory unit in accordance with puting device ( e g ? process 40 o 0 f FIG. 4). The computer 

one embodiment of the present invention. syslem 0f similar electronic computing device manipulates 

FIG. 5 is a timing cycle diagram illustrating a write anc j transforms data represented as physical (electronic) 

transaction from a processor to a memory unit in accordance quantities within the computer system memories, registers 

with one embodiment of the present invention. or other such information storage, transmission or display 

FIG. 6 is a timing cycle diagram illustrating a read devices, 

transaction from a memory unit to a processor in accordance 3Q Refer to FIG. 1 which illustrates an exemplary general 

with one embodiment of the present invention. purpose computer system 190 in which the present invention 

can be implemented. However, it is understood that com- 

BEST MODE FOR CARRYING OUT THE putef system 190 is an exemplary system and that other 

INVENTION computer system configurations may be used in accordance 

Reference will now be made in detail to the preferred 35 with the present invention, 
embodiments of the invention, examples of which are illus- In general, computer system 190 used by the embodi- 
trated in the accompanying drawings. While the invention ments of the present invention comprises bus 100 for 
will be described in conjunction with the preferred communicating information and digital signal processor 101 
embodiments, it will be understood that they are not coupled with bus 100 for processing information and 
intended to limit the invention to these embodiments. On the 40 instructions. In the present embodiment, digital signal pro- 
contrary, the invention is intended to cover alternatives, cessor 101 is a digital signal processor (DSP) core such as 
modifications and equivalents, which may be included the OAK™ DSP core licensed from DSP Semi Conductor by 
within the spirit and scope of the invention as defined by the VLSI Technology, Inc., although it is appreciated that other 
appended claims. Furthermore, in the following detailed processor configurations may be used in accordance with the 
description of the preferred embodiments of the present 45 present invention. In this embodiment, digital signal pro- 
invention, numerous specific details are set forth in order to cessor 101 includes an internal memory unit or units (e.g., 
provide a thorough understanding of the present invention. an on-core memory). In the present embodiment, digital 
However, it will be obvious to one of ordinary skill in the art signal processor 101 includes two separate random access 
that the present invention may be practiced without these memory (RAM) units (not shown), 
specific details. In other instances, well-known methods, 50 Continuing with reference to FIG. 1, computer system 
procedures, components, and circuits have not been 190 further comprises random access memory ("RAM 
described in detail so as not to unnecessarily obscure aspects volatile") 102 coupled with bus 100 for storing information 
of the present invention. and instructions for digital signal processor 101, read-only 

Some portions of the detailed descriptions which follow memory ("ROM non-volatile") 103 coupled with bus 100 

are presented in terms of procedures, logic blocks, 55 for storing static information and instructions for digital 

processing, and other symbolic representations of operations signal processor 101, data storage device 104 such as a 

on data bits within a computer memory. These descriptions magnetic or optical disk and disk drive coupled with bus 100 

and representations are the means used by those skilled in for storing information and instructions, display device 105 

the data processing arts to most effectively convey the coupled to bus 100 for displaying information to the com- 

substance of their work to others skilled in the art. In the 60 puter user, optional alphanumeric input device 106 including 

present application, a procedure, logic block, process, or the alphanumeric and function keys coupled to bus 100 for 

like, is conceived to be a self-consistent sequence of steps or communicating information and command selections to 

instructions leading to a desired result. The steps are those digital signal processor 101, and cursor control device 107 

requiring physical manipulations of physical quantities. coupled to bus 100 for communicating user input informa- 

Usually, although not necessarily, these quantities take the 65 tion and command selections to digital signal processor 101. 

form of electrical or magnetic signals capable of being The present invention is a data transfer element coupled 

stored, transferred, combined, compared, and otherwise between digital signal processor 101 and external memory 



07/07/2004, EAST version: 1.4.1 



US 6,442,671 Bl 

5 6 

130 via bus 105 and bus 110. In the present embodiment, the is coupled via bus 252a" to external memory 130. ExtO 

data transfer element is coprocessor 120. As will be seen decoder 280 is an address decoder of a type well known in 

below in conjunction with FIG. 2, bus 105 can comprise a the art. Read and write signals are generated by decoding the 

plurality of address buses and data buses for coupling digital signal either from PEDSTN bus 252a or from PESRCN bus 

signal processor 101 to different elements and devices that 5 252/) in extO decoder 280. The PEDSTN signal indicates a 

are incorporated within coprocessor 120. Similarly, bus 110 write tramactioD (e.g., extO write) to external memory 130, 

can comprise a plurality of address buses and data buses for and A the ^ RCN ?»» indicates a read transaction (e.g., 

cou P Ungtheelementsanddeviceswithincoprocessorl20to e *° ™ d ) e * eraal me ^ m °7 m - AA 

external memory 130 FIG * 3 shows addmona l detail regarding address modifi- 

' in cation and decode 270. Address modification and decode 

In general, external memory 130 represents a memory ™ 2?0 ofiket register 310 ^ adder 32 o. In the 

unit external to digital signal processor 101 (that is, external pre sent embodiment, adder 320 is a 16-bit adder although it 

to the DSP core). In one embodiment, external memory 130 ^ appreciated that a range other than 16 bits can be used in 

can be a cache memory coupled, for example, by a bus (not accordance with the present invention, 

shown) to the main memory (e.g., RAM volatile 102 and To generate an address in external memory 130 using a 

ROM non-volatile 103) of computer system 190. In another * s particular address from DSP 101, the address from DSP 101 

embodiment, external memory 130 can be a cache memory is provided as input to address modification and decode 270 

or a register (not shown) located within the computer v ia DXAP bus 251c. Offset register 310 is initialized with an 

system's main memory. ofiket value which is added to the address from DSP 101 to 

With reference now to FIG. 2, one embodiment of copro- map that address to an address in external memory 130. The 

cessor 120 is illustrated in accordance with the present 20 ofiket value in offset register 310 can be subsequently 

invention. As explained above, coprocessor 120 is coupled changed during operation. Thus, data can be saved to a 

between digital signal processor 101 (hereinafter, DSP 101) selected location in external memory 130 by specifying the 

and external memory 130. Coprocessor 120 introduces the appropriate offset value. By using address modification and 

capability for direct memory access between DSP 101 and decode 270, two addresses are generated per clock cycle, 

external memory 130. 25 For example, data at address 0 in the internal memory of 

Coprocessor 120 is comprised of a plurality of latches DSP 101 is mapped to address 8000 in external memory 130 
241, 242, 243, 245, 246 and 247. The function and operation by specifying an offset value of 8000 in offset register 310. 
of latches are well known in the art. In the present Similarly, address 1 in internal memory would be mapped to 
embodiment, the latches are 16-bit latches although it is 3Q address 8001 in external memory 130, and so on for sub- 
understood that different latch sizes may be used in accor- sequent addresses. 

dance with the present invention. In this manner, the address from the internal memory of 

Latches 241 and 242 are coupled to the DXAP bus DSP 101 is used to generate an address in external memory 

represented by 251a, 251/?, 251c and 25\d (collectively 130. In a similar manner, an address in external memory 130 

referred to herein as DXAP bus 251). DXAP bus 251 is an 35 is mapped to an address in DSP 101 by subtracting the offset 

address bus that is used to read or write an address between value from the address in external memory 130. Thus, in 

DSP 101 and external memory 130. In the present accordance with the present invention, it is not necessary to 

embodiment, DXAP bus 251 is a 16-bit bus and each address build an address generator in coprocessor 120 (FIG. 2), 

is 16 bits in length, although it is understood that a bus and thereby minimizing the number of gates needed in copro- 

address range other than 16 bits may be used in accordance ^ cessor 120 and consequently reducing costs, 

with the present invention. Refer now to FIG. 4, which is a flowchart of process 400 

Address modification and decode 270 is coupled to DXAP used to transfer data between DSP 101 and external memory 

bus 251. Address modification and decode 270 is used to 130 via coprocessor 120 in accordance with one embodi- 

map an address in the internal memory of DSP 101 to an ment of the present invention. Process 400 is implemented 

address in external memory 130, and vice versa. Additional 4 S via instructions stored in and executed by DSP 101. Process 

information regarding address modification and decode 270 400 results in the transfer of data between DSP 101 and 

is provided below in conjunction with FIG. 3. external memory 130 in a single clock cycle. The timing 

Continuing with reference to FIG. 2, latch 243 is coupled associated with process 400 is described further below in 

to the portions of the GEXDBP bus represented by 250a and conjunction with the timing cycle diagrams illustrated m 

2506, and latches 246 and 247 are coupled to the portions of 50 FIGS - 5 and 6 

the GEXDBP bus represented by 255a, 2556 and 255c. In st ep 405 of FIG. 4, as explamed in conjunction with 
External memory 130 is coupled to latches 243 and 247 by, FIG. 3, an address in the source memory unit is mapped to 
respectively, GEXDBP bus 2506 and 255a. GEXDBP bus an address in the destination memory unit (e.g., one of either 
250a is a data bus used for transferring data to and from DSP internal memory of DSP 101 or external memory 130 is the 
101. GEXDBP bus 2506 is a latched data bus for transferring 55 source memory unit and the other is the destination memory 
the data from latch 243 to external memory 130. For a write unit, depending on whether the transaction is a write trans- 
transaction from on-core memory (from DSP 101) to off- action or a read transaction). 

core memory (to external memory 130), latch 243 is embod- In step 410 of FIG. 4, with reference also to FIG. 2, data 

ied as an external register referred to in the instruction set as are transferred from the source memory unit to coprocessor 

"extO." GEXDBP buses 255a, 2556 and 255c transfer data 60 120, which provides an intermediate location for the data 

from external memory 130 through latches 246 and 247, between the source memory unit and the destination 

respectively, to GEXDBP bus 250a. For a read transaction memory unit. For example, in a write transaction from DSP 

from off-core memory to on-core memory, latches 246 and 101 to external memory 130, data are transferred from the 

247 are embodied as external registers likewise referred to internal memory of DSP 101 to latch 243 via GEXDBP bus 

in the instruction set as "extO." PEDSTN bus 252a and 65 250a. 

PESRCN bus 2526 are each coupled to latch 245, which is In step 415 of FIG. 4, with reference also to FIG. 2, data 

coupled via bus 252c to extO decoder 280. ExtO decoder 280 are transferred from coprocessor 120 to the destination 
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memory unit. Continuing with the example from above, data 
are transferred from latch 243 to external memory unit 130 
via GEXDBP bus 2506. 

In step 420 of FIG. 4, if there is no more data to be 
transferred from the source memory unit, process 400 is 
complete. If more data are to be transferred, steps 405, 410 
and 415 are repeated for the next set of data. For the 
subsequent sets of data, in accordance with the present 
invention, step 410 and step 415 are performed at the same 
time. That is, consider two consecutive sets of data being 
transferred from the source memory unit to the destination 
memory unit. The first set of data is transferred to copro- 
cessor 120 and then to the destination memory unit. While 
the first set of data is transferred from coprocessor 120 to the 
destination memory unit, the second set of data is transferred 
to coprocessor 120 from the source memory unit. This 
sequence is repeated for subsequent sets of data. Thus, while 
one set of data is exiting from one end of the pipeline that 
runs between DSP 101 and external memory 130, at the 
other end of the pipeline the next set of data is entering the 
pipeline. 

Hence, in accordance with the present invention, the next 
set of data is transferred from the source memory unit to 
coprocessor 120 at the same time (that is, during the same 
clock cycle) that the preceding set of data is transferred from 
coprocessor 120 to the destination memory unit. Thus, 
during each clock cycle after the first clock cycle that 
corresponded to the first data transfer from the source 
memory unit to coprocessor 120, data are pipelined into the 
source memory unit. The first set of data takes two clock 
cycles to be pipelined from the source memory unit to the 
destination memory unit, but data are transferred into the 
destination memory unit during each single clock cycle after 
the second clock cycle (e.g., after the pipeline has been 
primed). 

Therefore, only N+l clock cycles are required to transfer 
N units (e.g., blocks or tables) of data in accordance with the 
present invention. This represents a significant reduction 
over the prior art in the number of clock cycles required to 
transfer the same amount of data. The present invention 
reduces the number of clock cycles needed to transfer a 
given amount of data by approximately one-half. 

The instruction set for accomplishing steps 405, 410 and 
415 is exemplified by the following instruction set, where rl 
is a register inside DSP 101 that is the source for data to be 
read to external memory 130, r2 is the offset register, rO is 
a register inside DSP 101 for receiving data written from 
external memory 130, extO is as defined above, and N is the 
number of units of data to be transferred: 



-continued 



;oflset write 

mov offsct_rcg_addrcss, r2 
mov offsct_value, rl 
mov rl, (r2) 

;on-core address to read 
mov ##add_oncore, rl 

; transfer from on-core to off-core 

rep #N-1 

mov (rl) +, extO 



;read back 
mov ##addp_ 



.write, rO 



[take content of rl, save it to address 
pointed to by r2] 

[initialize first address in DSP of 
data to be read] 

[repeat next instruction N-l times] 
[read from DSP and save in external 
memory via extO] 
[source is external memory] 
[initialize first address in DSP where 
data is to be written] 



mov extO, (rO) 

rep #N-1 

mov extO, (rO) n 



10 



15 



25 



30 



35 



45 



55 



60 



65 



[dummy write to start data pipeline] 

[read from external memory and save 
in DSP via extO] 



The instruction set utilized in accordance with the present 
invention is devised to minimize the number of instructions 
required in order to effectively execute the data transfer from 
source memory to destination memory. Consequently, the 
size of the memory and thus the overall size of the DSP 
system are reduced. 

Refer now to FIG. 5, which illustrates timing cycle 
diagram 500 for a write transaction from a processor (e.g., 
DSP 101 of FIG. 2) to a memory unit (e.g., external memory 
130 of FIG. 2) in accordance with one embodiment of the 
present invention data transfer element (e.g., coprocessor 
120). Clock cycles are generated by DSP 101 to synchronize 
operations occurring in DSP 101, coprocessor 120 and 
external memory 130. "Phil" and "phi2" refer to the two 
phases of each clock cycle. Thus, phil plus phi2 is equiva- 
lent to a single clock cycle. In the timing cycle diagrams 
herein, when phi2 is indicated as occurring then phi2 is high 
and phil is low; and likewise, when phil is indicated as 
occurring, then phil is high and phi2 is low. 

With reference to both FIGS. 2 and 5, during the first phi2 
phase, DXAP bus 251a ("dxap") indicates the address in 
DSP 101 for the first set of data to be written to external 
memory 130. In the first phil phase, PEDSTN bus 252a 
("pedstn") indicates that the transaction is a write transaction 
("Wr") to external memory 130. Also in the first phil phase, 
DXAP bus 2516 ("add_latch__l") takes the address from 
DXAP bus 251a. Similarly, in the second phi2 phase, DXAP 
bus 251c ("add_latch_2") takes the address from DXAP 
bus 2516. Also in the second phi2 phase, GEXDBP bus 250a 
("gexdbp") takes the data to be transferred from DSP 101 to 
latch 243, and GEXDBP bus 2506 ("data_latch") takes the 
data from latch 243 to external memory 130. 

Timing cycle diagram 500 illustrates that the transaction 
associated with the transfer of the first set of data begins in 
the first phi2 phase and is completed in the third phi2 phase, 
which is equivalent to two clock cycles. However, in accor- 
dance with the present invention, as the first set of data is 
exiting the pipeline between DSP 101 and external memory 
130, the second set of data is entering the pipeline. In other 
words, at any time there are two sets of data in the pipeline, 
one each at either end. Thus, although the first set of data 
takes two clock cycles to complete its transfer from DSP 101 
to external memory 130, each clock cycle thereafter another 
set of data completes its transfer because each subsequent 
transfer overlaps the preceding transfer by one clock cycle. 

Refer now to FIG. 6, which illustrates timing cycle 
diagram 600 for a read transaction from a memory unit (e.g., 
external memory 130) to a processor (e.g., DSP 101) in 
accordance with one embodiment of the present invention 
data transfer element (e.g., coprocessor 120). 

With reference to both FIGS. 2 and 6, during the first phi2 
phase, DXAP bus 251a ("dxap") indicates the address in 
internal memory that will be used to generate the address in 
external memory 130 for the data that are to be written to 
DSP 101. In the first phil phase, PESRCN bus 2526 
("pesrcn") indicates that the transaction is a read transaction 
("read_mem") from external memory 130. In the second 
phi2 phase, bus 256a ("mem_out") takes the first set of data 
from external memory 130, and the first set of data is passed 
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through latch 247. In the second phil phase, the first set of 
data is passed through latch 246 to bus 255c ("data_latch"). 
In the third phi2 phase, GEXDBP bus 250a ("gexdbp") takes 
the first set of data and transfers it to DSP 101. 

5 

The transfer of the first set of data thus starts in the second 
phi2 phase and' is completed in the third phil phase, which 
is equivalent to two clock cycles. However, in accordance 
with the present invention, the second set of data is prepared 
concurrent with the transfer of the first set of data, and in the io 
next clock cycle follows the first set of data down the 
pipeline from external memory 130 to DSP 101. That is, 
looking at the signals on bus 256a ("mem_out"), the first set 
of data is transferred out of external memory 130 starting in 
the second phi2 phase. In the third phi2 phase, in accordance is 
with the present invention, a second set of data (not shown) 
is transferred out of external memory 130 immediately 
behind the first set of data. Thus, the two sets of data are 
transferred out of external memory 130 two phases, or one 
clock cycle, apart. 20 

Looking now at the signals on GEXDBP bus 250a 
("gexdbp"), the first set of data is transferred into DSP 101 
during the third phi2 phase and third phil phase. The second 
set of data is transferred into DSP 101 immediately behind 25 
the first set of data (e.g., one clock cycle later). Thus, after 
the first set of data has completed the transfer from external 
memory 130 to DSP 101, subsequent sets of data each arrive 
at DSP 101 every single cycle thereafter. 

30 

In summary, the present invention provides a system and 
method that improves digital signal processor system per- 
formance by reducing the number of clock cycles required 
to transfer data between the internal memory of the DSP 
core (e.g., on-core memory) and external memory (e.g., 35 
off-core memory). The present invention therefore allows 
the DSP core to advantageously utilize the expanded 
memory capability permitted by an external memory unit, 
because an external memory unit is not constrained by the 
space limitations associated with on-core memory. 40 

In the present embodiment, the present invention imple- 
ments a coprocessor (e.g., coprocessor 120 of FIG. 2) 
coupled between the DSP core and the external memory. The 
coprocessor accomplishes the transfer of data into either the 45 
internal memory or the external memory (depending on 
whether the transaction is a read or a write transaction) in a 
single clock cycle, thus reducing by approximately one-half 
the number of clock cycles needed to transfer a given 
amount of data. 50 

The present invention also utilizes an address mechanism 
(e.g., address modification and decode 270 of FIG. 2) that 
permits an address in one memory unit to be mapped into the 
other memory unit without having to build an address 
generator in the coprocessor. 55 

The preferred embodiment of the present invention, a 
coprocessor for fast memory transaction, is thus described. 
While the present invention has been described in particular 
embodiments, it should be appreciated that the present 60 
invention should not be construed as limited by such 
embodiments, but rather construed according to the follow- 
ing claims. 

What is claimed is: 

1. A system for transferring data between a digital signal 65 
processor (DSP) and a memory unit external to said DSP, 
said system comprising: 
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a memory unit the memory unit external to said DSP; 
a plurality of buses coupled to said memory unit; 
the DSP coupled to said plurality of buses, said DSP for 
receiving data from said memory unit and for process- 
ing an application to generate therefrom data to be 
stored in said memory unit; and 
a data transfer element coupled between said memory unit 
and said DSP, wherein said data are transferred into 
said data transfer element during a first clock cycle and 
out of said data transfer element during a second clock 
cycle immediately following said first clock cycle, said 
data transfer element comprising: 
a first data bus coupled between said DSP and said 
memory unit, said first data bus for transferring a first 
set of data from said DSP to said memory unit; 
a first latch device coupled to said first data bus; 
a second data bus coupled between said DSP and said 
memory unit, said second data bus for transferring a 
second set of data from said memory unit to said 
DSP; and 

a second latch device coupled to said second data bus; 

said first latch device and said second latch device 
providing intermediate locations for storing said first 
and second sets of data between said DSP and said 
memory unit; 

an address bus coupled between said DSP and said 
memory unit; and 

an address mechanism coupled to said address bus, said 
address mechanism for mapping an address in said 
DSP to an address in said memory unit, wherein a 
specified offset value is added to said address in said 
DSP to generate said address in said memory unit. 

2. The system of claim 1 wherein said second data bus is 
coupled to said memory unit at one end and to said first data 
bus at the other end. 

3. The system of claim 1 further comprising a third latch 
device coupled to said second data bus, said third latch 
device providing an intermediate location for storing said 
data between said DSP and said memory unit. 

4. The system of claim 1 further comprising a fourth latch 
device and a fifth latch device coupled to said address bus. 

5. The system of claim 1 wherein said specified offset 
value is changed to a different value to map said address in 
said DSP to a different address in said memory unit. 

6. In a computer system comprising a digital signal 
processor (DSP), an external memory unit and a data trans- 
fer element coupled between said DSP and said external 
memory unit, a method for transferring data between an 
internal memory unit of said DSP and said external memory 
unit in a single clock cycle, said method implemented by 
said DSP executing instructions contained in said internal 
memory unit and comprising the steps of: 

a) mapping a first address in a first memory unit to a 
second address in a second memory unit, said step a) 
comprising: 

specifying an offset value; and 
applying said offset value to said first address to map 
said first address to said second address; 

b) transferring a first set of data from said first memory 
unit to a latch device of said data transfer element 
during a first clock cycle; 

c) transferring said first set of data from said latch device 
of said data transfer element to said second memory 
unit during a second clock cycle immediately following 
said first clock cycle; 
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d) repeating said steps a) and b) during each consecutive 
clock cycle occurring after said first clock cycle for 
each set of data subsequent to said first set of data; and 

e) repeating said step c) during each consecutive clock 
cycle occurring after said second clock cycle for each 5 
set of data subsequent to said first set of data wherein 
said first memory unit and said second memory unit can 
be the DSP internal memory unit or the external 
memory unit, depending on the direction of data trans- 
fer. 10 

7. The method of claim 6 wherein said first memory unit 
is said external memory unit and said second memory unit 
is said internal memory unit. 



,671 Bl 

12 

8. The method of claim 6 wherein said first memory unit 
is said internal memory unit and said second memory unit is 
said external memory unit. 

9. The method of claim 6 wherein said step a) is imple- 
mented by an address mechanism of said data transfer 
element. 

10. The method of claim 6 further comprising the step of 
changing said offset value to a different value to map said 
first address to a different address in said second memory 
unit. 

* * * * * 
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